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Abstract —Artificial neural networks learn how to solve new 
problems through a computationally intense and time consuming 
process. One way to reduce the amount of time required is 
to inject preexisting knowledge into the network. To make use 
of past knowledge, we can take advantage of techniques that 
transfer the knowledge learned from one task, and reuse it on 
another (sometimes unrelated) task. In this paper we propose 
a novel selective breeding technique that extends the transfer 
learning with behavioural genetics approach proposed by Kohli, 
Magoulas and Thomas (2013), and evaluate its performance on 
financial data. Numerical evidence demonstrates the credibility 
of the new approach. We provide insights on the operation of 
transfer learning and highlight the benefits of using behavioural 
principles and selective breeding when tackling a set of diverse 
financial applications problems. 

Keywords—transfer learning, artificial neural networks, ge¬ 
netic algorithms, population studies, behavioural genetics, selective 
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1. Introduction 

It is fundamental for financial institutions to provide ac¬ 
curate risk assessments when evaluating new and/or existing 
customers (applicants). This risk assessment process should 
be continuous, evolve with the customer and/or institution 
requirements m . Most of the traditional assessment techniques 
that financial institutions use include but are not limited to: 
artificial neural networks (ANNs), support vector machines 
and complex tree structure predictors El. It is wasteful to 
use these techniques in isolation as it does not exploit inter¬ 
departmental and/or inter-institutional sharing of valuable in¬ 
formation (knowledge). What we are proposing in this paper 
is a novel framework that has its roots in behavioural genetic 
studies, and provides a technique that facilitates the use of 
existing customer knowledge (from different departments or 
even between institutions) to improve the overall efficiency 
and speed of traditional customer evaluation techniques. This 
framework stems from previous work done by Kohli et al. 
0, a, with the difference that it uses artificial breeding 
techniques between its populations of genetic algorithms to 
make this transfer of learning efficient on data with a financial 
aspect. 

Genetic algorithms are artificial intelligence’s search 
heuristic which can easily mirror the process of natural selec¬ 
tion. Sub-branch of the vast evolutionary algorithms class that 
are inspired from the natural processes of evolution Q. For 
genetic algorithms to behave in an evolutionary way we need to 
implement, all of the following evolutionary processes: selec¬ 
tion, mutation, inheritance and crossover ||6l. In order to take 


full advantage of the power of evolutionary algorithms we need 
to use populations of abstractions (ANNs as individuals) which 
over iterations (time) will evolve towards better solutions; this 
will happen without the need of any external interference. 
Traditionally we start from random populations of individuals 
and iteratively improve on the quality of the population by 
evaluating each individuals fitness, selecting only the best 
performing and mating them. We stop the evolution when we 
reach a generation threshold or when we hit the fitness target 
for the population Q. 

One of the best performing classifiers of financial data 
used by financial institutions have been ANNs fH. These have 
been tried and tested against statistically sound methods, and 
provide one of the most robust approaches to perform large- 
scale classification of financial data known to date El. We 
have managed to use genetic algorithms (with novel selection 
and mating techniques) and traditional feed-forward ANNs 
and form a hybrid algorithm. This hybrid behavioural genetics 
inspired algorithm is successful in performing transfer learning 
on heterogeneous tasks, and is also used to asses, to some 
degree, task similarity a. 

In transfer learning the goal is to transfer useful knowledge 
from a source task (input) to a target task (output). We dif¬ 
ferentiate between two types of transfer paradigms: functional, 
where learning happens simultaneously in the source and in the 
target; and representational, where tasks are learned at different 
times Q, 0. Most transfer learning research assumes a rela¬ 
tionship between source and target tasks 121,0,0. Therefore 
the problem is that most of the transfer learning algorithms 
expect source and target tasks to be similar, otherwise they 
will not produce positive results a, 0,0. Transfer learning 
between different tasks poses high risk as it might result 
in negative transfer, which damages learning capabilities. In 
traditional approaches, the risk of transfer having a negative 
impact on learning and performance is dictated by the degree 
of similarity between tasks 0,0,0. We focus on the branch 
of transfer learning which deals with harnessing the ability to 
learn. We move the top performing lerners from generation 
to generation, until our learning models are very optimised 
for learning. By combining specific breeding techniques with 
the population based behavioural genetics framework we have 
achieved positive transfer learning on diverse financial tasks. 

The paper is structured as follows. In the next section 
we give an overview of behavioural genetics and explain the 
main ideas that underpin our approach. In Section [m] the 
proposed methodology for transfer learning is derived and 


in Section |lV] the financial datasets are described. Section |V] 
presents experimental results. The paper ends with concluding 
remarks in Section m 

II. Overview of behavioural genetics 

Behavioural genetics studies the inheritance of traits be¬ 
tween individuals of the same population, and then translates 
them into similarities and/or differences. Individual variation 
is separated in genetic and environmental components, which 
are examined for their influences on animal (including human) 
behaviour. In animal studies breeding and gene knockout 
techniques are commonly used to extract the required infor¬ 
mation. For humans the use of twin or adoption studies is 
more common. In this paper we have combined both human 
and animal studies by using both selective breeding and twin 
studies. 

We have to stress the importance of the environment in 
individual development, this is simulated with the help of a 
filter applied to each training set. To get a better appreciation 
of the effect genetic and environmental influences have on 
network performance, we use the research on english past 
tense acquisition, with populations of identical and fraternal 
twins 13. To provide better variation within populations, we 
use monozygotic (MZ, genetically identical twins, 100% same 
genetic material) and dizygotic (DZ, genetically similar or 
fraternal twins, 50% same genetic material), which in ANN 
terms translate to networks with identical hyper-parameters for 
MZ and 50% identical hyper-parameters for DZ. Both type of 
networks (MZ or DZ) have random initial weights between 
their hidden nodes 11. 

Environmental influences are simulated by a unique filter 
which is applied to the training set for each individual of the 
population. The filtered training set resembles the notion of 
socio-economic-status (SES) a. Research on the effect of SES 
on language acquisition shows that children with lower SES 
will perform significantly worse at the same task than children 
with a higher SES. This happens because of differences in the 
amount and quality of learning resources M- We ensured 
that the equal environment assumption was kept in all the 
simulated populations, which means that both MZ and DZ 
individual twins share the same environment a, 0. Splitting 
the populations between 50% MZ and 50% DZ twins, not only 
ensures good variability but proves helpful when attempting to 
learn multiple unrelated tasks 0. 

By incorporating and optimizing selective mating tech¬ 
niques into the behavioural genetics framework outlined above 
we have managed to focus and enhance the transfer of learning 
obtained on heterogeneous tasks ffl. We got the inspiration 
from the animal kingdom where certain animals have been 
mated (and still are now) for specific traits by humans, 
throughout history im. 

III. Proposed Transfer Learning Methodology 

In this section we present the key components of our ap¬ 
proach. This is based on a modification of the behavioural ge¬ 
netics population framework proposed by0 that is combined 
with a new selective breading technique to enhance transfer 
of learning. Before we can use these algorithms we need to 


homogenise the feature dimensionality of our datasets, trans¬ 
forming them into environments with the same dimensionality 
for learning algorithms but keeping the internal distribution 
of the underlying problem the same. This is a requirement 
because we want our populations and individuals to be able 
to change environments without any external intervention on 
population and/or individual topology. Next we divide the data¬ 
sets into training, validation and testing, using a 60 - 20 - 20 
distribution and we also split the data equally on the classes 
we have. This is useful not only from a machine learning 
context but it also mimics training with a curriculum II3. 
It also ensures that training, validation and testing will have 
the same distribution of class examples and still have elements 
of randomness. 

Algorithm[T]takes inspiration from the behavioural genetics 
framework developed by Kohli 0 but does not place any 
emphasis on heritability. A big part of Kohli’s behavioural 
genetics framework is dedicated to heritability, which is ab¬ 
stracted, to some degree to task relatedness. We have used 
a more traditional approach to determine task relatedness, as 
outlined in the next section. SES has been randomly applied to 
each individual, from every population in the training phase, 
by removing vectors from the training set. SES is random and 
set between 0 and 40%, the latter meaning that 40% of the 
training set will be removed. 

A novel selective breeding technique is presented in Algo¬ 
rithm 1^ The selection part of this algorithm has no natural 
basis and has been developed in this way to fully exploit 
the variability and relationship between best and medium 
level performing twins. The best performing members of the 
population are selected for accuracy and medium level are 
selected to encourage population flexibility. Mixing mid level 
individuals with top performers is important for positive trans¬ 
fer of learning, otherwise, by choosing only top performers we 
have populations that are focused on optimal results and not 
on transfer. By mating top performers with medium level we 
obtain a good synergy between optimum results and transfer 
flexibility. The crossover and mutation part of the algorithm is 
the standard genetic algorithm random single point crossover 
and 0.1% chance of random mutation. 


IV. Datasets 

All the sets of data used in this work have been selected 
from the UC Irvine Machine Learning Repository ini and are 
comparable with data published in lfT4l . ifTSl . ifTl . ifTfil . Two 
of the three datasets are industry standards when it comes to 
assessing classifier accuracy on financial data m, 0 , na, 
and have been used to assess transfer of learning in na. 
A short description of the datasets is available below and a 
comparison between characteristics in is shown in Table 

• Statlog (Australian Credit Approval|^(Australian) - 

credit card applications with a good mix of attributes; 
continuous, small and large numbers of nominal val¬ 
ues plus some missing values. This dataset was used 

in 03, Cl, d, d- 


* https://archive.ics.uci.edu/ml/datasets/Statlog+( Australian+Credit+ 
Approval) 



Algorithm 1 Hybrid algorithm that exploits behavioural ge¬ 
netics intuitions in a transfer learning context 
Require: datasets ^ load all the preprocessed data-sets here 

1: for dataset in datasets do 

2: populations initialize 2 populations of the same 

size with random values (between specific intervals, pre¬ 
determined by the distribution of the data-set) for the 
parameters of each individual 

3: while populations threshold is not reached do t> we 

have used 20 generations as our threshold 

4: brothers •<— empty Array 

5: for population in populations do 

6: trainipopulation, dataset) t> only on training 

and validation sets, this also applies a random SES to the 
training set 

7: fitness <— assess(population, dataset) t> on 

testing set only 

8: leaders ^ selectFoiMatingipopulation, 

fitness) > extract top and middle performers from the 
population 

9: newPopulation ^ mate((eaders) > custom 

breeding method, outlined at algorithm]^ 

10: brothers ^ split(newPopulation) t> 

split them into 2 equal populations (arrays of individuals) 
so that there are no identical twins in the same population 
11: end for 

12: populations •<— comhme(brothers) t> 

create 2 populations (arrays) from the 4 currently residing 
in brothers by merging the populations that will not share 
any identical twins between them 

13: end while 

14: end for 


Algorithm 2 Selective breeding algorithm for transfer of 
learning 

Require: top ^ top performers from a population 
Require: mid ^ average performers from a population 
1: children •(— crossover(top) > combine all the 

top performers in pairs and produce 4 offspring for each 
pair: 2 MZ twins, 2 DZ twins 

2 : children ^ crossover(top, mid) t> 

combine middle and top performers in pairs and produce 
4 offspring for each pair: 2 MZ twins, 2 DZ twins 
3: children •(— crossover(m*d) t> 

combine all the middle performers in pairs and produce 4 
offspring for each pair: 2 MZ twins, 2 DZ twins 


• Stating (German Credit Data|^ (German) - credit 
data provided by Prof. Hofmann and the numerical 
dataset provided by Strathclyde University. Used in 

m, iH, Q, m 

• Banknote Authenticatiorj^ (Banknote) - data ex¬ 
tracted from real images of forged banknotes, with the 
help of an industrial camera. Provided by Volker Lo- 
hweg (University of Applied Sciences, Ostwestfalen- 
Lippe). Used in lfT9l . 


^ https://archive.ics.uci.edu/ml/datasets/Statlog+(German-l-Credit+Data) 
’ https://archive.ics.uci.edu/ml/datasets/banknote+authentication 


As you can see two of the datasets are from the SatLog 
project. The SatLog project involves comparing the perfor¬ 
mances of machine learning, statistical, and ANNs algo¬ 
rithms on data sets from real-world industrial areas including 
medicine, finance, image analysis, and engineering design. The 
two datasets chosen by us from the SatLog project are popular 
in the machine learning and the financial world. 


Datasets 

Inputs 

Instances 

Attribute types 

Australian 

14 

690 

6 numerical and 8 categorical 

German 

24 

1000 

numerical 

Banknote 

5 

1372 

numerical 


TABLE I. Dimensionality of feature space with attribute 
TYPES. All targets have 2 classes. 

The networks had the same number of hidden nodes 
for each task (the other intrinsic parameters were optimally 
selected), and the mean weight difference calculated between 
the various tasks, as can be seen in Table |II] 

To assess task relatedness on the training, validation and 
testing sets of data for each task, we have used the mean dif¬ 
ference between the weight space of best performing ANNs of 
the same size (same number of input, hidden and output nodes 
and layers) Q. We have chosen a fixed 100 hidden nodes 
ANN, and with the rest of the hyper-parameters (learning rate, 
momentum and slope of logistic function) varying to perform 
optimally on each of the different tasks. After finding the best 
hyper-parameters for each of our 3 datasets, we get the mean 
of the norm of the difference of the weight vectors between 
our datasets. You can see the outcome in Table m the smaller 
the difference the more related the tasks are. 


Datasets 

Australian 

German 

Banknote 

Australian 

0.0000 

0.0426 

0.0448 

German 

0.0426 

0.0000 

0.0401 

Banknote 

0.0448 

0.0401 

0.0000 


TABLE It. The mean of the norm of the difference of the 

WEIGHT VECTORS. 


V. Experiments and results 

Before any of the experiments can commence we have to 
make sure that the data is in the format described in section HIH 
We need the data normalized and split into training, validation 
and testing sets. 

We have two main sections for results, firstly we got the 
optimal neuro-computational calibrations for each dataset. This 
helped us encode each source task with its optimal ranges and 
it has enabled us to narrow down the initial search space. 
Although we can get the best set of hyper-parameters, we 
are not after best performance, what we are interested in is 
using intervals that center around the best performing hyper¬ 
parameters. This will ensure that transfer of learning can 
occur as the source networks will not be focused only on the 
source task, and will allow a smooth transition of learning 
from source to target(s). Sometimes the best hyper-parameter 
cannot be interpreted as the center of an interval, in that 
case we take the lowest most significant neuro-computational 
parameter as lower bound of the interval and the highest as the 
other. We have chosen only 4 hyper-parameters to optimise: 
number of hidden nodes, learning rate, momentum and slope 
of logistic function, but many more can be incorporated in this 

















framework. You can see the calibration bounds of the datasets 
in Table Hill 


Datasets 

Hidden nodes 

Learning rate 

Momentum 

Logistic slope 

Australian 

15 to 50 

0.01 to 0.2 

0.01 to 5.1 

1 to 4 

Geinian 

5 to 30 

0.01 to 0.4 

0.1 to 1.2 

0.8 to 2.1 

Banknote 

5 to IS 

0.01 to 0.15 

0.01 to 0.01 

0.01 to 1.2 


TABLE III. Optimal source task calibrations. 


After we have the source calibrations, we start using the 
two algorithms described in section m For the purposes 
of these experiments we have used populations of 1200 
individuals (ANNs), each trained for 1000 epochs, without 
early stopping ll20l . The fitness criterion was overall miss- 
classification error on the test data set. We have started with 


two random (but within the bounds presented in table Illi 


populations. We select a source task, from the 3 available 
and train the 2 random populations on this chosen task. 
Furthermore we evolve these populations by following the 
steps outlined in Algorithm [T] and Algorithm for 20 gen¬ 
erations. This produced highly optimised populations for the 
chosen source task, and populations that are flexible enough 
to transfer the acquired learning knowledge to aid learning of 
new tasks. The flexibility is possible because of the selective 
breeding pressures outlined in Algorithm We take the 2 
populations, breed them into one using the same technique 
as before (get top and middle performers and breed them) 
but without producing MZ and DZ twins. This produces only 
2 non-identical offspring per pair and get a 1200 individual 
population, which is the population that best encodes the 
selected source task. We do the same for the remaining 2 
source tasks, and have now one 1200 individuals population 
optimised for each of our tasks (datasets). 

The optimised populations are then trained and tested on 


each of the tasks, and the results are presented in Tables IV 
[V] and |Vl] The top part of each table represents the mean 
performance of the population on the validation data set of 
each target task, and the lower represents performance on 
the test set for each target task. When compared with results 
from literature ca, m, im, this approach is not the best 
performing one. This is because we have focused our efforts 
in producing positive transfer of learning, without any negative 
impact on the overall process of learning. Whilst approaches 
in m, m, M were conceived to solve one of the problems 
outlined in the datasets in isolation. We have succeeded in 
solving all problems together and share knowledge between 
solutions, as one can see from the continuity of the results 
in Tables |V] and |V^ The continuity is not only in the 
overall classification error from source to target tasks but also 
in all the underlying parts that make that error. As one can 
see there is no spike or abrupt jump in values between all 
the true and false positives or negatives. When comparing our 
approach with a population (1200 individuals) of randomly 
initialised individuals (networks), we can see that our approach 
is considerably more accurate. This comparison becomes evi¬ 
dent in Figure with reinforcing testing and validation results 
presented in Table |VII| 


Targets 

Australian 

German 

Banknote 

True positives 

330.75 

137.5 

737.25 

True negatives 

267.5 

606.5 

607.75 

False positives 

39.5 

93.5 

2.25 

False negatives 

52.25 

162.5 

24.75 

Precision 

0.837 

0.79 

0.961 

Recall 

0.871 

0.866 

0.996 

Mean square error 

13.297 

25.6 

1.968 

True positives 

329.75 

138.0 

736.25 

True negatives 

264.5 

608.75 

607.25 

False positives 

42.5 

91.25 

2.75 

False negatives 

53.25 

162.0 

25.75 

Precision 

0.833 

0.790 

0.959 

Recall 

0.862 

0.870 

0.995 

Mean square error 

13.877 

25.325 

2.077 


TABLE IV. Evaluations on the validation (upper) and testing 

(LOWER) DATASETS FOR SOURCE TASK: AUSTRALIAN. 


Targets 

Australian 

German 

Banknote 

True positives 

327.5 

138.75 

737.75 

True negatives 

268.75 

624.0 

608.25 

False positives 

38.25 

76.0 

1.75 

False negatives 

55.5 

161.25 

24.25 

Precision 

0.829 

0.795 

0.962 

Recall 

0.875 

0.891 

0.997 

Mean square error 

13.587 

23.725 

1.895 

True positives 

330.75 

127.0 

737.5 

True negatives 

270.0 

619.5 

608.75 

False positives 

37.0 

80.5 

1.25 

False negatives 

52.25 

173.0 

24.5 

Precision 

0.838 

0.782 

0.961 

Recall 

0.879 

0.885 

0.998 

Mean square error 

12.935 

25.35 

1.877 


TABLE V. Evaluations on the validation (upper) and testing 

(LOWER) DATASETS FOR SOURCE TASK: GERMAN. 


Targets 

Australian 

German 

Banknote 

True positives 

337.5 

111.75 

738.75 

True negatives 

263.75 

630.5 

607.25 

False positives 

43.25 

69.5 

2.75 

False negatives 

45.5 

188.25 

23.25 

Precision 

0.853 

0.773 

0.963 

Recall 

0.859 

0.901 

0.995 

Mean square eiTor 

12.862 

25.775 

1.895 

True positives 

334.25 

136.75 

735.5 

True negatives 

262.25 

621.0 

607.25 

False positives 

44.75 

79.0 

2.75 

False negatives 

48.75 

163.25 

26.5 

Precision 

0.843 

0.793 

0.958 

Recall 

0.854 

0.887 

0.995 

Mean square eiTor 

13.550 

24.225 

2.132 


TABLE VI. Evaluations on the validation (upper) and testing 
(LOWER) datasets FOR SOURCE TASK: BANKNOTE. 


Targets 

Australian German Banknote 

Validation error 

28.155 33.3 2.22 

Testing error 

37.5 28.7 3.01 


TABLE VII. CLASSinCATION ERROR FOR RANDOMLY INITIALISED 
NETWORKS 
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Fig. 1. Benchmark between randomly initialised networks and our proposed transfer learning approach 


• Random validation error 

• Random testing error 

• Australian source validation error 

• Australian source test error 

• German source validation error 

• German source test error 

• Banknote source validation error 

• Banknote source test error 


Random validation error 
Random testing error 
Australian source validation error 
Australian source test error 
German source validation error 
German source test error 
Banknote source validation error 
Banknote source test error 






VI. Conclusion 

Diverse financial data has been chosen as the topic of 
this work because financial institutions have and still are 
producing large amount of data on a daily basis. In contrast to 
approaches proposed in the literature where special methods 
are developed to solve a particular problem, we propose a 
transfer learning approach that exploits knowledge acquired 
in previous situations to learn a new problem. Approaches 
based on transfer learning are normally affected by negative 
transfer. The proposed framework based on behaviour genetics 
offers flexibility when learning different problems, alleviat¬ 
ing the issue of negative transfer in the proposed datasets. 
Furthermore, by using selective breeding, like humanity has 
done with animals for thousands of years, we have managed 
to improve overall accuracy and keep the learning flexibility. 
Novel selective breeding techniques injected in an already 
successful behavioural genetics computational framework, re¬ 
sulted in optimised positive transfer of learning. Although the 
results that we have reported are good from a classification 
and transfer point of view we could still improve this approach 
by utilizing more sophisticated selective breeding techniques 
and ensemble generation from populations based on individual 
class accuracy. In future work we intend to explore these 
methods in more depth. 
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